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| Abstract O 


One of the important research tool is questionnaire. Decision makers and researchers across all academic 
and industry sectors conduct surveys and questionnaires to uncover answers to specific, significant 
questions. In fact, questionnaires and surveys can be an effective tools for data collection required for 
research and evaluation. In order to develop a survey/questionnaire, first the researcher should decide how 
to collect the required data. In this regard, scaling is the branch of measurement that involves the 
construction of an instrument. One of the most wdely used scaling method is attitude scales to measure 
instruments and Likert scale is applied as one of the most fundamental and frequently used psychometric 
tools in sociology, psychology, information system, politics, economy and many more research. However, 
research methodology research have not particularly suggested the best rating scale to be chosen for a 
research. This study is going to provide an overview of the Likert scale and comparing rating scales of 
different lengths. Results will make researchers able to make decision on what number of Likert scale 
points use for their survey and questionnaire. Taken as a whole this study suggests using of seven-point 
rating scale and if there is a need to have respondent to be directed on one side, then six-point scale might 
be the most suitable. 
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I. INTRODUCTION 

Decision makers and researchers across all academic and industry sectors conduct surveys and 
questionnaires to uncover answers to specific, significant question (Taherdoost, 2016a). In fact, 
questionnaires and surveys are an effective tools for data collection. Once the variables of 
interest have been identified and defined conceptually, a specific type of scale must be selected. 
Scaling methods are divided into two main categories, open questions and closed question 
(Taherdoost, 2017b). Scaling is the process of generating the continuum, a continuous sequence of 
values, upon which the measured objects are placed. There are a number of factors that should be 
considered to choose an appropriate scaling method in a questionnaire. 


An open question is one in which the respondent does not have to indicate a specific response 
(Taherdoost, 2017a). Open questions have a tendency to generate lengthy answers. Often, 
respondents see open questions as an opportunity to respond to a question in detail. As oppose to 
that, a closed question is one in which a respondent has to choose from a limited number of 
potential answers (Taherdoost, 2016b). Usually this is a straightforward yes or no. Other closed 
questions may require the respondent to choose from multiple response options such as multiple 
choice questions, Likert scale and Semantic differential scale. As articulated by Taherdoost 
(2017b), scale methods could be classified as a rating scales and attitude scales. Figure 1 shows 
some of the commonly scaling methods. 


Graphic Rating 
Rating Scales Itemized Rating Scales 
Comparative Rating 


Attitude Scales 
Semantic Differential 


FIGURE 1: SCALING METHODS 


Scaling Technique 


Below is the brief description of each scaling techniques: 


Rating Scales; Raters evaluate a person, object or other phenomenon at a point along a 
continuum or ina category. A numerical value is then assigned to this point or category. Rating 
scales are among the most widely used measuring instruments. 


Graphic Rating Scales; Raters mark or indicate in another fashion, how they feel on a 
graphic scale of some sort. 
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Itemized Rating Scales; Raters select one of the limited numbers of categories that are 


ordered in some fashion. The number of categories is usually between 2 and 11. 


Comparative Rating Scales; Raters judge a person, object or other phenomenon against 


some standard or some other person, object or other phenomenon. 


Attitude Scales; Anyone of the variety of scales that measure an individual’s predisposition 
toward any person, object or other phenomenon. These scales differ from rating scales in that 
they are generally more complex and multi-item scales. 


Likert Scale; Respondent indicates degree of agreement and disagreement with a variety of 
statements about some attitude, object, person or event. 


Semantic Differential; Respondent indicates how strongly he/she holds an attitude. These 
scales include a progression from one extreme to another (Taherdoost, 2017b). 


In order to develop a survey/questionnaire, first the researcher should decide how to collect the 
required data (Taherdoost, 2018). In this regard, scaling is the branch of measurement that 
involves the construction of an instrument. There are some questions that may raise up in this 
step and researcher needs to release before developing the survey/questionnaire like; which 
scaling method should I choose for the survey/questionnaire? Does the number of response 
options matter? How many scales and response categories should be used? Is there an optimal 
number of alternatives for Likert scale items? What is the optimal number of response 
alternatives for a scale? What number of scale points willimprove the reliability of scales? What 
number of scale points will improve the validity of survey? What number of scale points will 
increase the response rate? What number of scale points is preferred by respondents? Is there 
any impact of item readability if midpoint response is used? Which Likert Scale is better to use; 
5-point or 7-point? Which is better; have an even or odd number of response options? Is there any 
advantage to use visual analog response scales than Likert scales? When should the midpoint 
response be endorsed in a survey? 


This article is going to provide information to answer these questions by comparing rating 
scales of different lengths. Results will make researchers to be able to make decision on what 
number of rating scale points use for their survey and questionnaire. Although the review 
includes both scaling techniques; rating scale and attitude scale, particularly the overview is 
prepared to make the proper selection of Likert Scale as a technique for the measurement of 
attitudes. Thus rating scale, attitude scale and Likert scale may use interchangeably in this 
study. 


II. LIKERT SCALE 
Attitude and rating scales are among the most widely used measuring instruments in like 
sociology, psychology, information system, politics, economy and other fields as well. However 
research methodology studies have not provided specific suggestion on the proper selection of 
rating scale for research studies (Jon A. Krosnick & Fabrigar, 1997). One of the most 
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fundamental and popular scaling method used in social science research is Likert scale. 


Same to other scaling methods, there is debates on the number of pointes on Likert scale as 
well. Likert scale has been developed in 1932 as part of doctoral dissertation of Rensis Likert 
(Likert, 1932). This scale as a psychometric tool, includes a set of statements of research study’s 
hypothesis. Participants in the survey are asked to state their level of agreement with those 
given statements from strongly agree to strongly disagree. Although the original Likert scale 
included five symmetrical and balanced points, during the years it has been used with different 
measurement range in terms of number of response options from two-points to eleven-points. 
Simms, Zelazny, Williams, and Bernstein (2019) summarized the Likert response labels used as 
shown in Table 1. 


TABLE 1: LIKERT RESPONSE LABELS 


[options] 7 | 2 | 3 | ¢ | 6s | ¢ [7] 2] se | ofn] 
Neither 
3-points || Disagree| Agree nor| Agree 
Disagree 
Strongly , Strongly 
points | Sisagree| Disagree | Agree | Shorea” | 


Neither . 
Agree nor ed Agree s ongly 
Disagree gree ener. 
Very ; ; Very 
F trongly : Slightly Slightly Strongly 
8-points || Strongly | >; Disagree : Agree Strongly 
points | srera, Disagree Disagree | Agree Agree Agree 
Very ; Neither p 
Strongly ; Slightly Slightly Strongly 
Strongly Disagree Disagree Disagree Agree nor) A gree Agree Agree Strongly 
Disagree Disagree Agree 
Very ; 
- Strongly ; Mostly Slightly Slightly | Mostly Strongly 
10-points|| Strongly Disagree | Disagree | Disagree | Agree | Agree | “9° | Agree Strongly 
Disagree Agree 
Very ; Neither ; Very 
11-points|| Strongly =a ongly Disagree ied PA Agree nor eae yes Agree oe Strongl 
Disagree| Disagree isagree | Disagree | Disaoree gree gree gree | Aoree 
Liker scale is simple to construct and likely to produce a highly reliable scale. Besides, from the 


perspective of participants, it is easy to read and complete. On the other hand, in this scale 
validity may be difficult to demonstrate and there is a lack of reproducibility. Additionally, 


another weakness of Likert scaleis that participants may avoid extreme response categories and 
this will cause central tendency bias. Also participants may response the statements either agree 
or disagree in order to please the experimenter (acquiescence bias). Social desirability bias is 
another weakness of the Likert scale which may happen as participants may not be honest 
instead try to portray themselves in a more socially favorable light. 
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III. COMPARISON RATING SCALES OF DIFFERENT LENGTHS 


The effects of rating scales have been investigated in terms of psychometric quality criteria and 
systematic measurement error (Menold & Bogner, 2016). These criteria are reliability (the 
precision of a measurement), validity (the extent to which statements about the concepts to be 
measured can be made on the basis of the measurement results), response style (extreme or 
middle), respondent preferences and respondents-friendliness of the survey. 


A. Reliability 


According to Schutz and Rucker (1975) the number of response categories does not materially 
influence the cognitive structure derived from the results. Thus it is suggested that it has little 
effect on the results obtained, however information retrieval is maximized by using six or seven 
points (Green & Rao, 1970). 


Symonds (1924) reported that inter-rater reliability is optimized using 7-point scales. Besides, 
(McKelvie, 1978; Nunnally, 1967) found that reliability is maximized with 7-point options. On the 
other hand, some researchers claimed that reliability is independent of the number of response 
options (Brown, Wilding, & Coulter, 1991; Matell & Jacoby, 1971). 


Preston and Colman (2000) analyzed the reliability coefficients for test-retest reliability and 
alpha coefficients for the internal consistency reliability. They found that the highest test-retest 
reliability is for 7 to 10 response scales and the lowest is for 3-point. Furthermore, they reported 
that Cronbach alpha coefficient is highest for 11-point and with very little difference 7-pomt. And 
like test-retest reliability, the lowest is for 3-point scales. Therefore it could be concluded that 
reliability is increased with increasing the number of response options alt hough from 7-point to 
11-point, reliability results are all very similar. 


B. Validity 


Loken, Pirie, Virnig, Hinkle, and Salmon (1987) examined the criterion validity of various 
response categories and found that 11-point scales are superior to 3-point and 4-point scales. 
Oppose to above, Matell and Jacoby (1971) reported that both reliability and validity are 
independent of the number of scales and so by decreasing the number of response choices, 
reliability and validity would not be decreased. 


Chang (1994) reported higher convergent validity coefficients for the 6-point scales compare to 
4-point scale, however found approximately similar criterion validity for both. Preston and 
Colman (2000) compared scales with varying numbers of response categories in terms of criterion 
validity and convergent validity. According to their report, 9-point has the highest creation 
validity although scores from five scales to eleven point have very similar criterion validity. Their 
results showed that the scales with relatively more response categories (six or more) have higher 
convergent validity. Altogether, by increasing the numbers of scale points, validity will increase. 
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C. Response Preference 


Jones (1968) studied the respondents' preferences for 2-point and 7-point scales and reported 
that respondents expressed that the 2-point scales are easier to use though the 7-points are more 
accurate, interesting and ambiguous. Jones (1968) concluded that respondents clearly preferred 
multiple-category over dichotomous scales. 


Most recently, Preston and Colman (2000) examined the respondent preferences from the 
perspectives of “ease of use", “quick to use” and “express feelings adequately”. In this study, 
respondents rated their level of preference from 0 to 100. Results prove that scales of five -points, 
ten-points and seven- points scored highest in respect of “ease of use”. On the other hand, in 
conjunctions with “quick to use”, shorter scales received the highest preference score. three-point, 
two-point and four-point rating scales were the most preferred. Oppose to previous two criteria, 
in regards to “express feeling adequacy”, rating scales with more options obtained higher rating 
from respondents. (Preston & Colman, 2000) concluded that respondent preferences were the 10- 
point scale, closely followed by the 7-point and 9-point scales. 


D. Odd or Even Number of Response Options 


Another issue that have gotten researchers attention to develop the rating scales is if attitude 
and rating scales should include an even or odd number of response options (Kulas & Stachowski, 
2013; Nadler, Weston, & Voyles, 2015). J.A. Krosnick (1991) suggested using midpoint scales. He 
mentioned that participants who wish to satisfice will look for a way to do so and if it is not 
obvious for them then they will choose the optimize one. He concluded that if the midpoint is not 
provided, then respondent will choose the optimized one however scales with middle alternative 
may discourage respondents from taking side in one direction. In brief, he claimed that although 
scales with midpoint have lower reliability, it will facilitate to collect more useful data. 
Additionally, according to Colman and Norris (1997), odd numbers of response categories have 
generally been preferred to even numbers because they allow the middle category to be 
interpreted as aneutral point which will give option to a person who truly has neutral position 
and will prevent forcing to take a side. On the point of view, there is no recommendation 
regarding the choice of scale and it has no effect on psychometric measurement quality criteria. 
Thus researchers can arrange rating scales either in ascending or descending order (Menold & 
Bogner, 2016). 


E. Visual Analog Response Scales 


Visual analog scales (Flynn, van Schaik, & van Wersch, 2004) are continuous measurement type. 
According to Simms et al. (2019), there is no psychometric advantage for visual analog scales 
rather than traditional rating scales. However non-task-related graphical elements like colors, 
shading or symbols should be used with caution in scales because they may affect respondents’ 
choice. 
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IV. DISCUSSION AND CONCLUSION 


According to Preston and Colman (2000), indices of reliability, validity, and discriminating 
power were significantly higher for scales with more response categories, up to about 7 although 
internal consistency did not differ significantly between scales. On the other hand, respondent 
preferences were highest for the ten-point scale, closely followed by the 7-point option (Preston & 
Colman, 2000). Besides, Miller (1956) argued that the human mind has a span of absolute 
judgment that can distinguish about seven distinct categories, a span of immediate memory for 
about seven items, and a span of attention that can encompass about six objects at a time, which 
suggested that any increase in number of response categories beyond six or seven might be futile. 


Although Matell and Jacoby (1971) argued that the number of response options do not affect 
reliability and validity but some studies showed that reliability increases from 2-point to 6-point 
or 7-point scales (Nunnally, 1967; Symonds, 1924). Besides, studies prove that validity is 
increased with six or more response scales (Chang, 1994; Hancock & Klockars, 1991; Preston & 
Colman, 2000). Furthermore, according to Preston and Colman (2000), five-point, seven-point 
and 10-point scales are relatively easy to use. Although shorter rating scales are rated as 
relatively quick to use, scales with 10 and 11 alternatives were much preferred to express 
respondents feelings adequately. They concluded that 10-point, 9-point and 7-point scales are the 
most preferred rating scales (Preston & Colman, 2000). More to the point, rating scales that are 
too short cannot reveal much about the distinctions a person makes among a large set of objects, 
consistence with this notion, number of studies showed that longer scales conveyed more useful 
information up to 7-point to 9-point (Bendig, 1954) and information transfer appears to decrease 
for scales of 12-point of longer (McRae, 1970). 


Colman and Norris (1997) mentioned that the majority of rating scales, Likert-scales and other 
attitude scales contain either five or seven response alternatives. Lewis (1993) concluded that 7- 
point scales correlate more strongly with observed significance level than 5-point scales. Besides, 
Finstad (2010) pointed out that seven-point scales are more likely to reflect respondents’ true 
subjective evaluation of a usability questionnaire item than five-point options. Although 
Bouranta, Chitiris, and Paravantis (2009) suggested that 5-point rating scales are less confusing 
and increase response rate, Diefenbach, Weinstein, and O’Reilly (1993) reported that seven-point 
item scale emerged as the best overall and were reported by respondents as the most accurate 
and the easiest to use. On the point of view, Simms et al. (2019) mentioned that there is a small 
to non-existent difference between six-point and seven-point scales. 


Taken as a whole this study suggests using of seven-point rating scale and if there is a need to 
have respondent to be directed on one side, then six-point scale is the most suitable. 
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